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Abstract 


To solve challenges occurred in the existence of large sets of data, recent 
improvements of machine learning furnish promising results. Here to pro- 
pose a tool for predicting lesser liquid credit default swap (CDS) rates in 
the presence of CDS spreads over a large period of time, we investigate 
different machine learning techniques and employ several measures such as 
the root mean square relative error to derive the best technique, which is 
useful for this type of prediction in finance. It is shown that the nearest 
neighbor is not only efficient in terms of accuracy but also desirable with 
respect to the elapsed time for running and deploying on unseen data. 


AMS subject classifications (2020): Primary 91G80; Secondary 62J05. 
Keywords: Credit default swap (CDS); machine learning; prediction; liq- 


uidity; spread. 


1 Introduction 


1.1 Credit default swap 


Using credit derivatives, participants of market can transfer credit risk for a 
portfolio of credits. The most important kind of credit derivative is the credit 
default swap (CDS) which consists of credit default index swap tranches, 
credit default index swaps, basket swaps, and swaps with single names, [6, 
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Chapter 1], [8]. To be more precise, CDS is a bilateral over-the-counter 
(OTC) derivative contract that enables two counterparties to buy and sell 
protection on the given reference entity. It inherits the traditional swap 
format and consists of two legs: a) premium leg and b) severity leg. The 
protection buyer pays regular fixed premium in return for receiving from the 
protection seller and the loss payment in case the reference entity defaults, 
see e.g. [22, Part 6], [26]. 

CDS is a basic building block for many other derivative contracts and 
methods. In this way, it is closely linked to debit valuation adjustment (DVA) 
and credit valuation adjustment (CVA). By assuming CDS contract as a 
continuous process/observation, it can be defined as follows [20]: 


T T 
cDs =e f ee ey) ar—n(a—R) | epee rde, A) 


where c, r, h, T and R are the CDS coupon, zero coupon interest rate, 
hazard rate, maturity of the contact and the recovery rate, respectively. The 
key driver of the CDS value is the hazard rate h = —c/(R—1), which shows 
that the hazard rate is simply the CDS rate divided by the loss function. 

CDS enables counterparties to manage and control credit exposure to a 
reference credit entity. Under the contract, on one side, the protection buyer 
(seller) pays (receives) a premium in return for credit loss compensation that 
is received (paid) when the reference entity defaults. In essence, the CDS is 
an insurance contract against reference entity default. Over the years, CDS 
has evolved in many directions and structural variations were proposed to 
adapt the contract to particular market needs, [4]. For example, the reference 
obligations can be a single entity, index, and baskets of a few names or larger 
pools. 

To discuss about the applicability, a company proposes a strategic risk 
indicator, CDS spreads, for its risk dashboard. If the CDS rate, which is the 
price for insuring against the default of a client, went outside a specified range, 
then mitigation steps can be performed to deal with the client’s increased risk, 
[12]. Recalling that CDS is basically a form of insurance that the buyer of 
say, a bond, buys from a financial institution, say a bank, against the bond’s 
going “bad” (not paying in full.) 

The CDS spread is the rate of payments that the buyer of the CDS makes 
to the seller each year. To discuss further, say the value of the bond was 
$1,000. A bank might charge an annual amount $10 per thousand if it felt 
that the bond had a slightly less than 1% chance of going bad in the coming 
year (because the $10 would include a commission.) If the buyer paid $10 or 
1% (the credit default swap rate), the buyer would be purchasing the right 
to sell back the bond to the bank for $1,000, no matter what the bond was 
really worth. This would be the “swap.” And its purpose would be to protect 
against credit default. Note that in practice, CDS contracts pay in regular 
intervals - typically quarterly. 
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In addition, quanto CDSs are designated in a given currency to furnish 
protection when default of a certain entity, [23]. There are some instances, 
such as for systemically important companies or for sovereign entities, when 
an investor considers purchasing protection on a currency against the one, 
at which the reference entity’ assets are denominated. It is known that in 
different currency denominations quanto CDS spreads are differences in CDS 
premiums of the same reference entity, [23]. 


1.2 Motivation 


In this paper, predictive analysis based on machine learning (ML) [25] is 
discussed for some active groups of liquid CDS rates given for various daily 
rates, which are then employed to predict lesser liquid CDS. This is the main 
motivation of the paper since this application can mostly be observed in the 
equity or credit markets where the factor of liquidity drives specific tools into 
certain categories, [21], [28, Chapter 9]. 


1.3 ML 


ML uses statistical methods to train machines from a given data set. After 
the learning, the systems produce optimized models that explain the data in 
the best way and restrict the potential biases, further enabling better assess- 
ments and decision making. Thus, such models are also broadly employed 
for predictions. ML is based on acquiring a habit in terms of learning. In 
fact, learning is considered as a process of progressive adaptation and the 
ability to produce the right patterns in response to a given set of inputs, [15, 
Chapter 6]. 


Here, an application of ML approach in financial mathematics is dis- 
cussed, see the book [13, Part 18.8] for some background. It is shown how 
we can predict lesser liquid CDS spreads over a list of actively traded and 
liquid CDS spreads over several years of daily spreads. 

ML is useful in finding patterns in contrast to traditional linear models, 
[16, Chapter 1]. The methods and tools like the nearest neighborhood (NeN), 
neural networks (NN), or decision trees (DT) suggest better flexibility in find- 
ing complex relationships. In fact, prediction as a technique to approximate 
outcome from supporting features basically recommends practical solutions 
to economics and finance where the approximation could be very invaluable. 
Inflation prediction, marketing campaign model testing, growth rates fore- 
cast, or market data generation, are just few instances where ML becomes 
a necessary tool in making decisions, [5, 27]. Additionally, reasonable CPU 
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times and prediction ability too make ML a promising approach than the 
traditional regression-like methods. 


1.4 Contribution 


Here, we contribute by providing an ML-based model by comparing and 
finding several well-known ML methods when there are many features for 
the model which are CDS rates. As a matter of fact, we attempt in finding 
the best model when the number of CDS rates are 5, 8, 10 over a period 
of 5 to 10 years. ML is used because of the existence of large sets of data. 
It is discussed and illustrated that the nearest neighbor prediction method 
performs the best when the size of the original set of data is becoming larger 
and larger. In the presence of such a complex large set of financial data, it will 
be observed that the regression-type methods which are classical statistical 
tools cannot anymore be employed to tackle such problems. This paper also 
follows the recent works [21, 30]. In this paper, we show that ML furnishes 
an efficient method to solve this challenge in finance. 
The advantages of this study comprise: 


e Considering many CDS spreads as features for the ML techniques to 
do the prediction. 


e By implementing and imposing several well-known ML predictions, we 
obtain a method for predicting CDS rates. 


e Furnishing insights into the applicability and accuracy of ML-based 
economic models for predicting CDS rates. 


1.5 Structure of the paper 


After having an introductory discussion regarding the issues with CDS rates 
and ML in this section, the remaining sections of this article are structured as 
follows. In Section 2, predictive analytics (PA) is introduced, which comprises 
different statistical methods from ML, predictive modeling, and data mining 
which investigate historical and current facts to forecast the forthcoming 
unknown events. Section 3 furnishes how the proposed procedure can be 
applied on a large set of financial data. The sample size of the series is 
more than thousands of observations. It is shown by way of illustration that 
the proposed solution method for prediction is useful and provides promising 
results. Several numerical experiments are investigated with implementation 
details in Section 4 to confirm the applicability of the ML methods and to 
compare with several well-known and state-of-the-art methods in literature 
for prediction. Lastly, several concluding summaries are made in Section 5. 
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2 Predictive analytics 


PA is a term mostly employed in analytical and statistical methods, [25], 
which forecasts the future by investigating the historical and current data. 
The forthcoming occurrences and behavior of variables can be predicted by 
the PA’s models and a score is furnished. A lower score shows a lower likeli- 
hood of occurrence of the event and a higher score shows a higher likelihood 
of occurrence of an event. Transactional and historical data patterns are 
evaluated by such techniques for finding out the solution to many scientific 
problems. These models are useful in recognizing the opportunities and risks 
for each manager, employee, or customer, [11, 19]. 


Statistically speaking, the problem of prediction simplifies in finding 
conditional distribution of a variable y considering other variables x = 
(%1,2,...,2n). Furthermore in the methodology of data science, variables x 
are named as features. For the calibrated conditional distribution, generally 
the prediction point y is the highest value (mean), [14]. 


It is well known that the most common tool is (linear) regression analysis. 
ML recommends a better set of tools that could summarize usefully different 
types of nonlinear relations in the economic data. A promising predictor in- 
cludes deriving a function that minimizes an error function. Then, the target 
of prediction methods is to obtain promising out-of-sample approximations 
for unseen data. This process is not trivial and generally regressions are 
known to be weak around out-of-sample predictions. This leads to overfitting 
issues (especially for regression-like methods of higher orders), [7, Chapter 
3}. 


Basically, the targets of a ML predictive modeling task are twofold: to 
return a high-performing predictive model for operational use and an ap- 
proximate of its performance, [9]. The process basically consists of the fol- 
lowing stages: (a) Tuning, at which various combinations of methods and 
their hyper-parameter values are calibrated, (b) Attaining an ultimate model 
trained on all existing data by the best configuration, and finally (c) Perfor- 
mance estimation. 


3 Problem set-up 


CDS indices are tradable products that permit investors to take short or long 
credit risk positions in certain equity markets or segments thereof. Here it 
is considered that the data are generally identically distributed and indepen- 
dent. 
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3.1 Data 


Let us consider n number of CDSs which are served as features, see e.g. [31]. 
These CDS rates are fed from market but in this work we employ simulated 
values based on a correlation matrix as follows: 

A= (eis) piaisenas » U5 =4;i, Aa=H1, (2) 
where its entries are obtained via uniform distributions subject to the volatil- 
ity vector V = (v1, 02,---,Un+1), 0 < uj < 1. Then the simulated data are 
extracted from the covariance-variance matrix 

CM = (¥:0; 44,5) ntixnt? (3) 
which is a symmetric positive definite (SPD) matrix. The financial data sets 
can be constructed in the software system Wolfram Mathematica [1, 2] as 
comes next: 


SeedRandom[12] ; 
volatility = Reverse@Sort@RandomReal[{0, 0.1}, n + 1]; 
crl = RandomReal[{0, 0.5}, {n + 1, n + 1}]; 
crl = (1/2) (crl + Transpose[cr1]); 
crl = crl.Transpose[cr1] ; 
Table[crl[[i, iJ] = 1, fi, n + 1}]; 
cm = Table[volatility([li]]*volatility[[jl]]*cr1(li, jj], 
{i, 1, Length[volatility]}, {j, 1, Length[volatility]}]; 


initial = RandomReal[{0.5, 1.5}, n + 1]; 
max = Number of days in the time period; 
meani = {0, 0, 0, O}; 

mn = MultinormalDistribution[mean1, cm]; 
data2 = RandomVariatel[mn, max]; 

datai = Prepend[data2, initial]; 

data = Accumulate[data1] ; 


We have chosen SeedRandom|12] on purpose to let readers reproduce the 
financial data set. The data follow a multivariate normal (Gaussian) distri- 
bution with mean vector 0 and variance matrix (3). We now divide the data 
into several different scenarios: 


e n=5,8,10 spreads. 


e T = 5,10 years which roughly indicates 1825 and 3650 days, respec- 
tively. 


Financial set of data with higher dimensions and more features is a recent 
problem. Usually, many sets of data have features with similar information. 
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This can act as noise in the system and increase the complexity. Note that 
if the data are fat then it states more features relative to observations or 
tall, which states many observations relative to features. Figure 1 reveals a 
sample set of data for n = 5,8,10 spreads when T = 5. 


Figure 1: Five years CDS spreads for n = 5,8,10, in top, middle, and bottom, respec- 
tively. 


3.2 The sub-sets 


The aim of this subsection is to impose different prediction routines under 
the ML environment to obtain a model and then employ it for out-of-sample 
domain, [3, 21]. Here we first break the in-sample data (obtained from the 
CDS markets in practice) into three famous sets known as (i) training, (ii) 
validation and (iii) testing sets. The training, validation and testing subsets 
are roughly 60%, 20% and 20%, of the whole original set, respectively. 


Here the data type is in numerical vector format and only the first three 
members of the training set is now given to illustrate how the prediction 
routine is going to be incorporated (n = 10:) 


Training set = { 

{0.839768, 1.48679, 0.710729, 1.48696, 0.713219, 
0.923109, 1.46489, 1.43977, 1.23923, 1.29405} -> 1.2654, 

{0.961012, 1.48665, 0.78337, 1.54342, 0.696918, 0.958177, 
1.51974, 1.45154, 1.25072, 1.3013} -> 1.26877, 

{0.846854, 1.4823, 0.676574, 1.56933, 0.661563, 0.944591, 
1.51674, 1.4319, 1.23782, 1.28701} -> 1.26747 

ae 
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3.3 Predictors 


The classification methods are needed for our purpose and we focus on pre- 
diction methods. The predictor routines used for comparisons in this work 
are given as follows: 


e RF: Random forest is an ensemble learning method in regression and 
classification that incorporates deriving a multitude of decision trees 
(DT). To understand this further, the prediction of the forest is attained 
by the mean-value tree predictions or getting the most common class. 
Every DT is trained on a random subset of the set of training and 
employs only a random subset of the features. 


e DT: A DT is a flow chart-like network, in which each internal node 
shows a test on a feature, each branch shows the test’s outcome, and 
each leaf shows a probability density, value distribution or class distri- 
bution. 


e GBT: The Gradient boosting tree is a technique of ML for classifica- 
tion and problems that provides a prediction model as an ensemble of 
trees. The training on the trees is done sequentially with the aim of 
compensating for the weaknesses of previous trees. Basically the Light 
Gradient Boosting Machine framework is used in the back end. 


e LR: The linear regression forecasts the computational value y employing 
a linear combination of numerical features « = {x1,22,...,%n}. The 
conditional probability P(y|x) is modeled based on 


Pyle) oc exp (—U LE 


with f(0,x2) = 26. 


e NN: A neural network comprises stacked layers, each doing a simple 
calculation. Then, the information layer by layer is processed from the 
input layer to the output layer. A loss function is minimized to train 
the NN on the training set employing gradient descent. 


e NeN: Nearest neighbors is a kind of instance-based learning and it 
chooses the averages of the values among the k nearest neighbors or 
the commonest class in its simplest form. To generate short term fore- 
casts, similar patterns of behavior are located with respect to NeN 
by a distance measure that is normally the Euclidean distance. The 
time evolution of these NeNs is exploited to obtain the required fore- 
cast. Thus, the procedure employs only local information to predict and 
makes no effort to fit a model to the whole time series at once. The 
selection of the size (m) normally called the embedding dimension and 


Tran. j. numer. anal. optim., Vol. 13, No. 1, pp 19-37 


27 


Finding an efficient machine learning predictor for lesser liquid credit ... 


of the number of neighbors (&) is a fundamental point of this routine, 
[10]. Note that there is no training step during NeN. 


e GP: The Gaussian method is via the assumption of a Gaussian process 
for the model. This process is expressed by its covariance function and 
it will estimate the parameters of this covariance function in the training 
phase. Then, it is conditioned on the training data and employed to 
infer the value of a new instance by a Bayesian inference. 


Having a large set of data, we assign 20% of the data in each scenario for 
the validation set. Typically this is employed when the data in the training 
set and the data that we want to forecast arise from various resources. Using 
this, the hyper-parameter selections are done by testing performance on data. 


In addition, the predictors are set and trained on the target device CPU, 
while we use 1234 as random seeding whenever required inside the predic- 
tors. Our prediction routine can be obtained as a predictor function. As an 
instance for the case of NeN when n = 10, T = 5, it can be written in what 
follows: 


7) Input type: NumericalVector (length: 10) 
Method: NearestNeighbors 
Number of training examples: 1095 


PredictorFunction 


To illustrate the results of comparisons, we furnish Table 1 for one run 
in the case n = 5, T = 5, which shows that number of training examples, 
validation set examples, and test set instances, would be 1095, 365, and 366, 
respectively. Here since we employ the built-in functions for the predictor 
routines in Mathematica, so all the hyper-parameters have been assigned au- 
tomatically based on the built-in optimization techniques to choose the best 
hyper-parameters for the model corresponding to the training and validity 
sets. 


Predictors are compared with respect to their CPU times (seconds) in Ta- 
ble 2. The time reported is based on the running CPU times for constructing 
the models. A critical bottleneck in using the NN is its high computational 
time for constructing the model. This restricts its applicability for the pur- 
pose of our financial application in predicting lesser liquid CDS rates. In 
other words, the larger the financial data set (of this type), the larger the 
CPU time is. Therefore, it becomes requisite to improve and rely on alternate 
predictors. 

Meanwhile, the testing stage requires the comparison of the test vector to 
all existing data points in the data set which might take a significant amount 
of time. 
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4 Benchmarking 


The implementations in this paper were performed by Mathematica 12.0 [17, 
Chapter 7] installed in a computer having Core i7-9750H with SSD memory 
and 16 GB RAM. It is important to mention that the general results and 
conclusion are obtained by shuffling the data set for 50 times and getting 
means whenever required. 


4.1 Implementation details 


We can apply several predictive routines defined in Subsection 3.3 to attain 
the forthcoming value in the out-of-sample domain (for unseen data). In 
ML, the validity set is used to tune our model parameter settings and a test 
set to evaluate the model’s performance on unseen events. The procedure 
of prediction here using our ML methods on the list of large data sets is 
incorporating a prediction function on the unseen data. 


Table 1: Information of compared predictors for n = 5, and T = 5. 


Single Batch 


Subamethod, evaluation evaluation Model Rains 
seaanctens time speed Loss memory] | se (s) 
7 (ms/ | (example/ (kB) 
example) ms) 


Feature fraction=1/3, 
RF Leaf size = 4, 8.4 12.9 —1.83+0.01} 605 1.3 
Tree number = 100 
Feature fraction=1, 
Distribution smoothing = 1 
Leaves number=60, 
GBT Learning rate = 0.2, 4.04 44.0 +2.63 + 0.26} 723 10.93 
Leaf size = 7 
L1 regularization=0, 
L2 regularization = 100.0, 
Optimization method = 
Normal equation 
Network depth=2, 
Max training rounds = 30 
Neighbors number=2, 
Distribution 
NeN . 1.11 194. +0.5 + 0.49 174 0.6 

smoothing = 0.5, 
Nearest method = k-D tree 
Estimation method 
=Maximum posterior, 
Search method 
= Simulated Annealing 


DT 1.14 487. —1.43+0.08} 121 0.6 


LR 1.31 362. —2.61+40.03} 307 3.72 


NN 2.98 25.1 —2.95+0.04] 471 38.1 


GP 3.83 9.7 —0.98 + 0.26) 6.3 5.6 
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Table 2: Comparisons of running times for various routines to construct the model. 


n|T| RF | DT|GBT| LR | NN |NeN| GP 
5] 5] 1.3 | 0.6 )10.93)3.72/38.1] 0.6 | 5.6 
8 | 5 | 0.8 | 0.6 | 10.8 | 3.7 [34.6] 0.6 | 5.2 
10) 5 | 1.2 0.69} 21.5 |3.79) 86. | 0.7 | 5.9 
5/10] 1.4 }1.29} 13.0 |4.42) 85. | 1.32 |31.1 
8 |10)1.81]1.30) 13.0 | 4.3 | 158.) 1.40 ]33.8 
10/10)1.87)1.31) 17.2 [4.06] 419 | 1.41 |31.3 


To compare various routines for prediction, two different measures are 
employed as described below. The absolute error is calculated via 


E= Paredies _ Pactuallloo: (5) 


where Pactual ANd Ppredict are the exact and predicted values, respectively. 
Also, the root mean square relative error (RMSRE) of NV predicted values 
Ppredict Whose real values are Pactual, is defined by 


N . a 
bes S- 1 Py vedic — Pactual (6) 
i=1 N Dctaal 


Here N is the length of each sub-sets. For this analysis, we compared the ¢€ 
and e for each training, validity and test sets in each scenario. The routines 
have not seen the test sets before and the accuracies that come from the 
incorporation of the test sets are important to find the most useful algorithm 
for prediction. The results based on a mean of over 50 shuffles on the original 
set containing the training, validity, and test (prediction) sets, each time, are 
gathered in Table 3. It is important to state that for some routines such 
as NeN, there is no training part and its model representation is the entire 
training dataset, [24, Chapter 7]. The training set in Table 3 for such methods 
is in fact the calibration set. 


Remark 1. Here it is noted that RF is an ensemble DT model based on 
different feature selections and data set partitions that do not require cross- 
validation. However, it is used in a similar fashion just like the other models. 
This is mainly to check the robustness of the models when the input data 
change. The more general the model, the less susceptible it would be to data 
variation. And every model needs to be cross-validated and because of this 
as well as having fair comparisons, we compute mean over 50 shuffles on the 
original set. 


Also note that KNN makes predictions using the training dataset directly. 
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Table 3: Comparison of mean results among different methods. 


é € 
Scenario Routine Training set Validity set Test set Training set Validity set Test set 
19 RF 5.5x10-? 44x 107? 46x10-? 9.3x107-3 9.7x 107% 9.8 x 1073 
DT 66x10-? 65x 107-7? 6.5x10-? 1.0107? 1.1107? 11x 10~? 
= GBT 2.9x10-? 3.9x 107? 3.9x10-? 3.2x10-3 56x 1073 5.5~x 1073 
oS LR 5.2x107-? 5.0x 107? 5.0x10-? 13x 107-2 13x 107? 1.3 x 107? 
NN 2.6x 107-2 2.4x 107? 26x 10-2 4010-3 4.8x 10-3 4.9 x 10-3 
= NeN 3.11072 3.3x 107? 3.4x1072 2610-3 4.21073 4.3 x 10-3 
GP 2.4x10-? 2.7x10-? 26x10-? 3.3x107-3 4.7x 1073 4.7 x 107-3 
19 RF 26x10-? 26x10-? 26x10-? 7.1x107-3 7.21073 7.2 x 1073 
DT 5.4x107-? 5.1x 107? 5.0107? 81x10-3 8.9 x 10-3 89x 10~3 
By GBT 19x10-2 2610-2 26x10-2 2110-3 41x 107-3 4.1 x 1073 
oS LR 44x10-2 43x1072 4.3x10-2 1110-2 11x 10-2 11x 10-2 
NN .7x 10-2 16x 107? 1.7x 107? 2.7x 10-3 3.4x 1073 3.4x« 1073 
e NeN 16x 107? 18x 107? 18x 10~? 2.0x1073 31x 1073 3.1.x 1073 
GP .7x 1072 1.7x 107? 1.7x 107? 16x 107-3 3.3.x 1073 3.3 x 1073 
19 RF .7x 1072 1.7x 1072 16x 1072 6.0x 1073 5.8x 1073 5.8 x 1073 
DT 36x10-2 3.2x 1072 3.3x 107-2 6.2x1073 6.7 x 1073 6.7 x 1073 
GBT 12x10-2 14x 10-2 1.3x10-2 11x107-3 21x 107-3 2.2x 1073 
S LR 8x10-? 1.7x 107? 1.7x 107? 46x 107-3 4.7x 1073 4.7 x 1073 
= NN 7.0x 1073 7.1.x 1073 7.3x 107% 1.11073 1.5x 1073 1.5 x 1073 
NeN x10-2 1.0107? 1.1107? 14x 107-3 21x 1073 21x 1073 
Ss GP .7x 10-2 1.3x 107? 1.5107? 2.0x 107-3 23x 107-3 2.4 1073 
o RF 5.0x107-2 48x 1072 5.01072 9.5x107% 9.7x 1073 9.7 x 107-3 
mi DT 7.0x10-? 66x 107? 65x 10-2 9.9x1073 1.0107? 1.0x 10~? 
GBT 3.5x10-? 41x 107? 42x10-? 40x 107-3 56x 1073 5.6 x 10~3 
& LR 68x10-? 66x 107-2 6.7x107? 15x 107? 1.5.x 107? 1.5 x 107? 
is NN 2.7x 107? 2.7x 107? 2.8x107-? 4.01073 4.7x 107% 4.7x 1073 
NeN 19x 107-2 2.9x 10-2 2.8x 107-2 1.81073 3.8x 1073 3.4x 1073 
Ss GP 2.7x10-2 2.8x 10-2 2.9x10-? 3.2x10-3 44x 10-3 4.4~x 107-3 
° RF 3.7x10-? 3.9x 107? 3.9x10-? 12x 107? 1.2107? 1.2x 107? 
or DT 7.2x10-? 7.2x 107? 69x 107? 1.1107? 1.1107? 11x 10~? 
GBT 2.51072 3.0x10-? 3.0x10-? 34x10-3 49x 1073 4.9 x 1073 
& LR 65x10-? 63x 107? 6.3x10-? 16x 10-2 16x 10~? 1.6 x 107? 
oO NN 19x1072 2.1.x 107-2 2.01072 3.3x1073 4.0x 107% 3.9 x 107-3 
NeN 1.3x107? 19x 107? 2.0x 107-2 16x 10-3 3.0x 107% 3.0 x 107-3 
Ss GP 2.0x10-? 2.0x 107? 21x 107-7 2.5x10-% 3.7x 107% 3.7 x 107-3 
° RF 3.1x10-? 3.1107? 3.2x10-? 1141x107? 11x 107? 1.1.x 107? 
DT 7.7x10-2 7.1x10-? 6.7x 10-7? 8.9x10-3 9.6 x 10-3 9.5 x 10-3 
GBT 2.0x10-2 2.1x10-2 2.3x10-2 1.7x10-3 2.7x 1073 2.7x 10-3 
& LR 2.9x10-2 2.8x10-2 2.8x10-2 80x10-3 81x 10-3 8.0~x 10-3 
S NN 13x 107? 11x 107? 12x 107? 17x 1073 2.0x 107% 2.0x 1073 
NeN 86x 1073 11x 107? 11x 107? 1.11073 2.0x 1073 2.0x 10-3 
2 GP 11x 107? 10x 107? 1.0 x 107? 1.7x 107? 19x 1073 19x 107% 
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Actual and predicted test set values. Absolute errors 
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Figure 2: NeN results on the test sub-set for one run of the scenario n = 10, and T= 5 
in left and its associated log absolute errors in right. See the online colorful version for 
further clarification. 


4.2 Quantitative results 


The large data sets considered for checking and comparing the presented 
models have 1825 and 3650 members each comprising 5, 8 and 10 elements 
respectively. 

The results of comparisons for the unseen CDS rates (the test sets) are 
provided in Table 3. We observe that all the predictors replicated the training 
sets quite well. But it is observable that the best one by considering both the 
numerical accuracy and CPU time is mostly NeN. In order to save space and 
avoid providing repeated similar figures of comparisons, Figure 2 is provided 
only for the scenario n = 10, T’ = 5 to reveal that the NeN furnishes promising 
prediction on the test (prediction) sub-set. 

We visualize the scatter plot of the test values as a function of the pre- 
dicted values in Figure 3 for one run. This illustrates again the point that 
the size of the input financial data and their types have clear effect on the 
choice of the predictors in ML. We have illustrated the prediction by CDS 
data and revealed the application of non-regression tools as better techniques 
in PA. 

It is well known that data correlation is the way at which one data set 
can correspond to another data set, [29]. And in ML, we can think of how 
the features correspond with the output. Data visualization and correlation 
may help decide, which ML method to use. Accordingly, Table 4 provides the 
means of correlations for 50 runs of the shuffled data among the actual and 
predicted values of different methods under several scenarios to also help in 
obtaining the best method for our CDS problem. Noticing that small values 
may not necessarily represent a bad correlation as long as the set of data 
has a large statistically significant correlation. When we have datasets with 
many features, the data correlation would be of clear importance. 

Considering all the results given in Tables 1-4 reveal that NeN is the best 
ML routine that can be considered for prediction of lesser liquid CDS rates 
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under the format of the financial data described in Section 3 both in terms 
of accuracy and the elapsed CPU time. NeN is a non-parametric non-linear 
forecasting routine, which is mainly via pieces of time series, in the past, that 
may have a resemblance to pieces in the future. This is useful in finance, see 
also [18]. For NeN, no learning of is needed and all of the work happens at 
the time a prediction is requested. Recalling that the number of neighbors 
selected, the length of the series and the embedding dimension to perform 
predictions are indicated automatically by the built-in routines inside our 
programming language. 


0.005 + 


actual value 


0.000 | “+ 


actual value ~ predicted value 


-0.005 + 


| fl fl aan : 
ST 2 1.15 1.20 1.25 1.30 
predicted value predicted value 


Figure 3: Comparison of perfect prediction line and predictions (left) and the residual 
plot (right) in NeN routine for the scenario n = 10, and T = 5. 


Table 4: Correlation between the actual and predicted values. 


T RF DT GBT LR NN NeN GP 
5 5 0.97 0.96 0.99 0.95 0.99 0.99 0.99 

5 0.97 0.96 0.99 0.93 0.99 0.99 0.99 
10 5 0.97 0.97 0.99 0.98 0.99 0.99 0.99 
5 10 0.97 0.96 0.99 0.92 0.99 0.99 0.99 
8 10 0.97 0.97 0.99 0.95 0.99 0.99 0.99 
10 10 0.97 0.98 0.99 0.98 0.99 0.99 0.99 


Computational pieces of evidence from Table 3 reveal that NN, NeN and 
GP have the best performance on the unseen test sets over 50 shuffles. How- 
ever, accuracy is not the main factor in choosing the best routine when the 
computational CPU times are different. Clearly the lower the CPU time of 
training, the more useful method we have when the accuracies are almost 
the same. Due to this and based on Table 2, the best performance belongs 
to NeN in terms of computational time. Thus, we have found the NeN as 
our model that could be performed automatically without any theoretical 
assumptions through the process of progressive adaptation. 
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This subsection is ended by pointing out that some other models such as 
RNN and LSTM can also be used for comparisons that implicitly fit well with 
time series data. For such routines, we remind that, e.g., LSTMs take longer 
to train, require more memory to train, are easy to overfit, are sensitive 
to different random weight initializations and dropout is much harder to 
implement in LSTMs. Besides, our data are not stock prices and the NeN 
already performed best in terms of running time as well as the accuracy. 
Thus, comparisons to other solvers is no longer necessary. 


4.3 Results on unseen data 


Finally in this section, it is necessary to illustrate how the NeN as an efficient 
ML technique for predicting lesser liquid CDS rates can be imposed on totally 
new unseen data. Considering the NeN routine as the predictor, saved as 
prediction previously and obtained when n = 10, and JT = 10, then the 
data for this verification are simulated as follows: 


n2 = 10; 

SeedRandom [12345] ; 

volatility2 = Reverse@Sort@RandomReal[{0, 0.1}, n2]; 

crl2 = RandomReal[{0, 0.5}, {n2, n2}]; 

crl2 = (1/2) (crl2 + Transpose[cr12]); 

crl2 = crl2.Transpose[cr12]; 

Table[crl2[[i, i]] = 1, fi, n}]; 

cm2 = Table[volatility2[[i]]*volatility2[[j]]*crl2[[i, jl], 
{i, 1, Length[volatility2]}, {j, 1, Length[volatility2]}]; 

initial2 = RandomReal[{0.1, 2.0}, n2]; 

max2 = 100; 

mean2 = ConstantArray[0, n2]; 

mn2 = MultinormalDistribution[mean2, cm2]; 

data22 = RandomVariate[mn2, max2]; 

datai2 = Prepend[data22, initial2]; 

dataTest = Accumulate [datai2]; 

dataTest2 = RandomSample[dataTest] ; 


Now we impose the prediction model from NeN on unseen data: 
pdataTest = prediction[dataTest2] ; 


Here for 101 unseen data, we obtain the results of predictions based on NeN 
and plot them in Figure 4 (left), while the distribution for this unseen data 
can be obtained as follows: 

with the following probability density of the predicted values: 

and illustrated in Figure 4 (right). 
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Type: SmoothKernel 
Data points: 101 ] 
Input dimension: 1 
Domain: {1.18, 1.26} 
Bandwidth: 0.00315 


DataDistribution[ Ma 


A Domain: {{1.18, 1.26}} 


InterpolatingFunction | Output: scalar 


| Dd 1.17755 <x < 1.25855 


Q True 


L 
20 40 60 80 100 1.05 1.10 1.15 1.20 1.25 1.30 


Figure 4: Predicted values (left) and their data distribution (right) in NeN routine for 
unseen data. 


5 Conclusions and future work 


Recently, there has been a proliferation of ML techniques and growing interest 
in their applications in finance, where they have been applied to sentiment 
analysis of news, trend analysis, and portfolio optimization. This paper has 
explored the potential of ML to enhance the investment process. A prediction 
model based on ML was discussed in equity markets when the numbers of 
predictors/features which are CDS rates are high over a larger period of 
time. This is especially relevant to finance where the ability of data groups 
to forecast the values of lesser liquid tools is of high interest. The results 
obtained in this work are useful as a model for predicting lesser liquids and 
can be employed for further investigation of the dynamic relationship between 
the VIX index and the CDS markets. When dealing with multidimensional 
set of data, it is requisite to filter out non-correlated features. Instead, it 
is better to use fewer highly correlated features to train a model. Taking 
into account such consideration may help improve the robustness of the NeN 
method, which is under study for further work in our team. 
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